{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This examples show how to use MiniSom to solve a classification problem. The classification mechanism will be implemented with MiniSom and the evaluation will make use of sklearn.\n",
    "\n",
    "First, let's load a dataset (in this case the famous Iris dataset) and apply normalization:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from minisom import MiniSom\n",
    "import numpy as np\n",
    "\n",
    "data = np.genfromtxt('iris.csv', delimiter=',', usecols=(0, 1, 2, 3))\n",
    "data = np.apply_along_axis(lambda x: x/np.linalg.norm(x), 1, data)\n",
    "labels = np.genfromtxt('iris.csv', delimiter=',', usecols=(4), dtype=str)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's naive classification function that classifies a sample in `data` using the label assigned to the associated winning neuron. A label $c$ is associated to a neuron if the majority of samples mapped in that neuron have label $c$. The function will assign the most common label in the dataset in case that a sample is mapped to a neuron for which no class is assigned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def classify(som, data):\n",
    "    \"\"\"Classifies each sample in data in one of the classes definited\n",
    "    using the method labels_map.\n",
    "    Returns a list of the same length of data where the i-th element\n",
    "    is the class assigned to data[i].\n",
    "    \"\"\"\n",
    "    winmap = som.labels_map(X_train, y_train)\n",
    "    default_class = np.sum(list(winmap.values())).most_common()[0][0]\n",
    "    result = []\n",
    "    for d in data:\n",
    "        win_position = som.winner(d)\n",
    "        if win_position in winmap:\n",
    "            result.append(winmap[win_position].most_common()[0][0])\n",
    "        else:\n",
    "            result.append(default_class)\n",
    "    return result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can 1) split the data in train and test set, 2) train the som, 3) print the classification report that contains all the metrics to evaluate the results of the classification."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              precision    recall  f1-score   support\n",
      "\n",
      "      setosa       1.00      1.00      1.00        13\n",
      "  versicolor       0.92      1.00      0.96        12\n",
      "   virginica       1.00      0.92      0.96        13\n",
      "\n",
      "    accuracy                           0.97        38\n",
      "   macro avg       0.97      0.97      0.97        38\n",
      "weighted avg       0.98      0.97      0.97        38\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import classification_report\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(data, labels, stratify=labels)\n",
    "\n",
    "som = MiniSom(7, 7, 4, sigma=3, learning_rate=0.5, \n",
    "              neighborhood_function='triangle', random_seed=10)\n",
    "som.pca_weights_init(X_train)\n",
    "som.train_random(X_train, 500, verbose=False)\n",
    "\n",
    "print(classification_report(y_test, classify(som, X_test)))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
